Over the past decade, health insurance, and specifically whether the government should provide single-payer healthcare, has been a major topic of political debate. In the US, medical expenses are a primary cause of bankruptcy (Himmelstein 2019). Access to affordable healthcare is a major determinant of an individual’s wellbeing, particularly if they may have preexisting conditions, exposures, or vulnerabilities that make them more susceptible to illness or make recovering from setbacks more difficult. Right now, there are many people in the US and within the Bay Area that do not have health insurance. Particularly during a global pandemic, this makes certain populations particularly vulnerable. In addition to lack of healthcare coverage, environmental contaminants that communities are exposed to can put them at increased vulnerability to illness. In the United States, heart disease is a leading cause of death, and has been for the past decade (Healthline 2018). Environmental causes of heart disease include particulate air pollution, pesticides, and metals that may appear in drinking water (Cosselman 2015).
Here I will explore how health insurance coverage varies for different demographic groups in the Bay Area, specifically for populations of different races. I will then study potential environmental determinants of cardiovascular disease, specifically pesticide exposure, water quality, and PM2.5 exposure, to determine which indicator is the best predictor of cardiovascular disease.
From this, we can see that as a whole across the nine counties in the Bay Area, white individuals have a disproportionately low likelihood of not having health insurance. Interestingly, the only other group that appears to be significantly disproportionately represented is the population of “some other race alone,” which is disproportionately more likely to not have health insurance. Some might say that there is, generally, equity in health insurance coverage. However, true equity might look like all individuals having healthcare.
Mapping healthcare coverage will help us understand how equity in coverage varies over space.
Percent of white and non-white people with healthcare coverage in the Bay Area
As we can see, the percent of households with health insurance is fairly high in both demographic groups, with the majority of Census Tracts averaging about 90%. To get a better understanding of how the two groups vary, we can map the difference between the percent of white people and the percent of non-white people with healthcare.
Difference between percent of white people and percent of nonwhite people with healthcare coverage
This map shows that the majority of land area falls in census tracts with a slightly negative difference, meaning that a greater proportion of non-white people than white people have healthcare. However, in the areas where more white people have healthcare, the inequity tends to be greater, reaching close to a 30% difference in some areas. Additional mapping and data could be useful in understanding why the map looks this way, but overall it suggests that maybe there is not a strong correlation between race and whether or not an individual has healthcare in large parts of the Bay Area.
It is also important to interpret this map with the context of the equity analysis, which showed, as a whole, white people were less likely not to have healthcare. The map suggests that this regional inequity in healthcare coverage is driven by local inequity that is concentrated in small pockets, and in those areas the issue is quite pronounced.
Next, we will bring in data from CalEnviroScreen to look at the interplay between healthcare coverage and risk of cardiovascular disease.
For CalEnviroScreen purposes, rate of cardiovascular disease is measured as “Spatially modeled, age-adjusted rate of emergency department (ED) visits for AMI per 10,000 (averaged over 2015-2017).” It is important to recognize that people without health insurance are probably less likely to visit the emergency room, even if they are in need, which makes this a somewhat unreliable measure of cardiovascular disease rates, especially for the analysis we are doing here.
Rates of cardiovascular disease in the Bay Area
Some of the highest rates exist in the East Bay, specifically around Hayward, Richmond, and Vallejo, as well as further inland near Antioch and Fairfield.
Next, we will consider different enviornmental indicators as predictors of cardiovascular disease.
Positive values indicate that increases in one variable are associated with increases in the other. So, increases in pesticide concentrations are associated with increases in drinking water contaminants, and increases in PM2.5 are associated with increases in cardiovascular disease. Negative values show the opposite. Increases in drinking water contaminants and increases in pesticide concentrations are associated with decreases in PM2.5 concentrations From this data, we can see that for all indicators, a cleaner and healthier environment (low pesticides, low PM2.5, better drinking water) is associated with reduced rates of cardiovascular disease.
Indicator definition: Total pounds of 83 selected active pesticide ingredients (filtered for hazard and volatility) used in production-agriculture per square mile, averaged over three years (2016 to 2018).
There appears to be a slight corrleation between higher concentrations of pesticides and higher rates of cardiovascular disease.
##
## Call:
## lm(formula = ces4_bay_data$"Cardiovascular Disease" ~ Pesticides,
## data = ces4_bay_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.760 -2.920 -0.721 2.175 12.780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.649485 0.099720 106.794 < 2e-16 ***
## Pesticides 0.003822 0.001339 2.855 0.00436 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.918 on 1576 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.005146, Adjusted R-squared: 0.004515
## F-statistic: 8.152 on 1 and 1576 DF, p-value: 0.004357
The R squared value is quite low. Variation in pesticides only explains 0.51% of variation in cardiovascular disease. There is a slightly positive correlation between the two indicators (slope of 0.003822 with standard error 0.001339). The p-value of 0.00436 is below 0.05, meaning the results are statistically significant.
The mean of the residuals is close to zero and curve is fairly symmetric.
Because the R squared is so low, we can conclude that pesticides are a poor predictor of cardiovascular disease.
Indicator definition: Drinking water contaminant index for selected contaminants, 2011 to 2019
Here there appears to be a negative correlation between drinking water contaminants and cardiovascular disease.
##
## Call:
## lm(formula = ces4_bay_data$"Cardiovascular Disease" ~ ces4_bay_data$"Drinking Water",
## data = ces4_bay_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0841 -2.9093 -0.6266 2.1625 12.8523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.6601704 0.2442778 47.733 < 2e-16 ***
## ces4_bay_data$"Drinking Water" -0.0034656 0.0008067 -4.296 1.85e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.911 on 1576 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.01157, Adjusted R-squared: 0.01095
## F-statistic: 18.45 on 1 and 1576 DF, p-value: 1.847e-05
The R squared for drinking water is higher than that for pesticides at 0.011. This means that variation in drinking water contaminants explains explains 1.15% of variation in cardiovascular disease. The slope is -0.0034656 with a standard error of 0.0008067, which is much lower than for pesticides, confirming that this is a better predictor of cardiovascular disease. The p-value is 1.85e-05, again making these results statistically significant.
So while this regression is a better fit than the regression with pesticides, meaning that drinking water quality is a better predictor of cardiovascular disease, the correlation between the two indicators, as shown by R squared, is still quite low.
The residuals are somewhat symetrially distributed around zero if you look at summary alone, but plot shows that curve is skewed.
Indicator definition: Annual mean concentration of PM2.5 (weighted average of measured monitor concentrations and satellite observations, µg/m3), over three years (2015 to 2017).
There is a positive correlation between PM2.5 concentrations and rates of cardiovascular disease.
##
## Call:
## lm(formula = log(ces4_bay_data$"Cardiovascular Disease") ~ ces4_bay_data$PM2.5,
## data = ces4_bay_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.94804 -0.25783 -0.00379 0.23135 0.84283
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.76764 0.12438 14.211 < 2e-16 ***
## ces4_bay_data$PM2.5 0.06343 0.01463 4.336 1.54e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3576 on 1578 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01177, Adjusted R-squared: 0.01115
## F-statistic: 18.8 on 1 and 1578 DF, p-value: 1.545e-05
This is the highest R squared so far, showing that variation in PM2.5 explains 1.17% of variation in cardiovascular disease. The p-value is the lowest of the three indicators at 1.54e-05, again statistically significant.
Residuals are centered at zero and relatively symetric.
From this, we see that PM2.5 is the best predictor of cardiovascular disease, but it is still a very weak predictor.
Now, instead of looking at environmental indicators, we can bring in our race and healthcare data from Part 2 and consider these variables as predictors of cardiovascular disease.
Here we see a positive correlation between the percent of non-white people in a tract and rates of cardiovascular disease.
Here you can toggle between the percent of non-white people and rates of cardiovascular disease in each census tract.
##
## Call:
## lm(formula = bay_ces4_race$"Cardiovascular Disease" ~ bay_ces4_race$PERCENT,
## data = bay_ces4_race)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3266 -2.7564 -0.7992 2.1612 13.5946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.027831 0.225707 35.57 <2e-16 ***
## bay_ces4_race$PERCENT 0.055021 0.004239 12.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.735 on 1572 degrees of freedom
## Multiple R-squared: 0.09682, Adjusted R-squared: 0.09624
## F-statistic: 168.5 on 1 and 1572 DF, p-value: < 2.2e-16
The R squared here shows that variations in the percent of nonwhite people can explain 9.68% of variation in cardiovascular disease, making this the best predictor we’ve encountered so far. The slope of the correlation is 0.055, meaning that as the percent of nonwhite people increases, so does the rate of cardiovascular disease. We also see the lowest p-value yet, at 2.2e-16, which is statistically significant.
The takeaway from this is that race, and specifically the percent of non-white people in an area is better than pesticide concentration, drinking water contaminants, or PM2.5 concentrations at explaining variations in rates of cardiovascular disease.
Here we see a positive correlation between the percent of people without healthcare and the rate of cardiovascular disease.
##
## Call:
## lm(formula = bay_ces4_hc$"Cardiovascular Disease" ~ bay_ces4_hc$PERCENT,
## data = bay_ces4_hc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7818 -2.6967 -0.6663 2.0977 12.7955
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.88545 0.14594 60.88 <2e-16 ***
## bay_ces4_hc$PERCENT 0.40679 0.02554 15.93 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.647 on 1572 degrees of freedom
## Multiple R-squared: 0.1389, Adjusted R-squared: 0.1384
## F-statistic: 253.6 on 1 and 1572 DF, p-value: < 2.2e-16
Here we see an R squared higher than that for race. Variation in the percent of people without health insurance explains 13.89% of the variation in rates of cardiovascular disease. The slope of the correlation line is 0.40679, which is approximately eight times higher than the slope of the correlation between percent nonwhite individuals and cardiovascular disease. This means that a slight increase in the percent of people without health insurance leads to a significant increase in the rate of cardiovascular disease.
Knowing that cardiovascular disease is measured by emergency room visits, we can imagine that the actual rate of people with cardiovascular disease but without health insurance might be even higher, making this a surprising finding and one of great interest for further exploration.
Now, let’s map the residuals of health insurance as a predictor of cardiovascular disease.
A negative residual suggests that our regression model is over-estimating the rates of cardiovascular disease given the percent of people with healthcare in these areas. The lowest residuals occur in areas like downtown San Francisco and large parts of Marin and San Mateo counties. This suggests that in these areas there are other factors that are better predictors of cardiovascular disease than healthcare.
The positive residual suggests that we would actually see more cases of cardiovascular disease than expected given the amount of people who have healthcare. These generally occur in the areas where rates of cardiovascular disease were highest to begin with. So this means that in areas where there are high rates of cardiovascular disease, looking at the percent of people without health insurance alone will lead to an under-prediction of cases of cardiovascular diseases.
Finally, we will look at the intersecting identities of race, immigration status, and multigenerational households to see how this may relate to health insurance in the Bay Area. Multigenerational households are a higher risk group because of the age of the members of the household. For example, you may have had personal experience during the COVID-19 pandemic where you or someone you know had to be extra cautious because they had an elderly relative living with them. For these high risk populations, not having access to healthcare makes them even more vulnerable. Here we will use PUMS data to understand how these identities relate.
Percent of white and non-white immigrant multigenerational households with health insurance coverage in Bay Area PUMAS
Difference between the percent of white and non-white immigrant multigenerational households with health insurance coverage in Bay Area PUMAS
This map shows a similar pattern to the map that looked only at race and health insurance. We see that there is more area where the percent of white households with health insurance is lower than the percent of nonwhite households. However, in the areas where the percent of nonwhite households with insurance is lower, the difference between the two groups is greater in magnitude. This map is useful in understanding that race may be a more important factor to look at in terms of healthcare inequity than whether or not an individual was born in the U.S. or lives in an multigenerational household.
Sources
David U. Himmelstein, Robert M. Lawless, Deborah Thorne, Pamela Foohey, and Steffie Woolhandler, 2019: Medical Bankruptcy: Still Common Despite the Affordable Care Act American Journal of Public Health 109, 431_433, https://doi.org/10.2105/AJPH.2018.304901
“What Are the 12 Leading Causes of Death in the United States?” Healthline, November 1, 2018. https://www.healthline.com/health/leading-causes-of-death
Cosselman, K., Navas-Acien, A. & Kaufman, J. Environmental factors in cardiovascular disease. Nat Rev Cardiol 12, 627–642 (2015). https://doi.org/10.1038/nrcardio.2015.152